AITopics | tsitsiklis and van roy

e3bc4e7f243ebc05d66a0568a3331966-AuthorFeedback.pdf

Neural Information Processing SystemsJun-1-2025, 08:15:26 GMT

artificial intelligence, machine learning, neural network, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.43)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.30)

Add feedback

The Fixed Points of Off-Policy TD

Neural Information Processing SystemsMar-15-2024, 14:57:28 GMT

TD can fail to converge [Boyan, 1994] [Tsitsiklis and Van Roy, 1997] fixed! J. Zico Kolter | The Fixed Points of Off-Policy TD | Poster T6 This work is about fixing off-policy TD Basic idea: reweight samples so that TD solution has quality guarantees (and so that TD converges) Technical idea "filtered" states stationary distribution of policy

fixed point, off-policy td, tsitsiklis and van roy, (3 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts (0.08)

Technology: Information Technology > Artificial Intelligence (0.87)

Add feedback

Optimal Stopping via Randomized Neural Networks

Herrera, Calypso, Krach, Florian, Ruyssen, Pierre, Teichmann, Josef

arXiv.org Machine LearningApr-28-2021

This paper presents new machine learning approaches to approximate the solution of optimal stopping problems. The key idea of these methods is to use neural networks, where the hidden layers are generated randomly and only the last layer is trained, in order to approximate the continuation value. Our approaches are applicable for high dimensional problems where the existing approaches become increasingly impractical. In addition, since our approaches can be optimized using a simple linear regression, they are very easy to implement and theoretical guarantees can be provided. In Markovian examples our randomized reinforcement learning approach and in non-Markovian examples our randomized recurrent neural network approach outperform the state-of-the-art and other relevant machine learning approaches.

algorithm, neural network, tsitsiklis and van roy, (12 more...)

arXiv.org Machine Learning

2104.13669

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Vietnam > Long An Province > Tân An (0.04)
Asia > British Indian Ocean Territory > Diego Garcia (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Bhandari, Jalaj, Russo, Daniel, Singal, Raghav

arXiv.org Machine LearningJun-6-2018

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement learning, its theoretical analysis has proved challenging and few guarantees on its statistical efficiency are available. In this work, we provide a simple and explicit finite time analysis of temporal difference learning with linear function approximation. Except for a few key insights, our analysis mirrors standard techniques for analyzing stochastic gradient descent algorithms, and therefore inherits the simplicity and elegance of that literature. A final section of the paper shows that all of our main results extend to the study of Q-learning applied to high-dimensional optimal stopping problems.

finite time analysis, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1806.0245

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Add feedback

On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence

Korda, Nathaniel, Prashanth, L. A.

arXiv.org Machine LearningSep-1-2015

We provide non-asymptotic bounds for the well-known temporal difference learning algorithm TD(0) with linear function approximators. These include high-probability bounds as well as bounds in expectation. Our analysis suggests that a step-size inversely proportional to the number of iterations cannot guarantee optimal rate of convergence unless we assume (partial) knowledge of the stationary distribution for the Markov chain underlying the policy considered. We also provide bounds for the iterate averaged TD(0) variant, which gets rid of the step-size dependency while exhibiting the optimal rate of convergence. Furthermore, we propose a variant of TD(0) with linear approximators that incorporates a centering sequence, and establish that it exhibits an exponential rate of convergence in expectation. We demonstrate the usefulness of our bounds on two synthetic experimental settings.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

1411.3224

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Analysis of Temporal-Diffference Learning with Function Approximation

Tsitsiklis, John N., Roy, Benjamin Van

Neural Information Processing SystemsDec-31-1997

We present new results about the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of a Markov chain using linear function approximators. The algorithm we analyze performs online updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a characterization of the limit of convergence, and a bound on the resulting approximation error. In addition to establishing new and stronger results than those previously available, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. Furthermore, we discuss the implications of two counterexamples with regards to the Significance of online updating and linearly parameterized function approximators. 1 INTRODUCTION The problem of predicting the expected long-term future cost (or reward) of a stochastic dynamic system manifests itself in both time-series prediction and control.

algorithm, function approximator, markov chain, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Belmont (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.59)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.42)

Add feedback

Analysis of Temporal-Diffference Learning with Function Approximation

Tsitsiklis, John N., Roy, Benjamin Van

Neural Information Processing SystemsDec-31-1997

We present new results about the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of a Markov chain using linear function approximators. The algorithm we analyze performs online updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a characterization of the limit of convergence, and a bound on the resulting approximation error. In addition to establishing new and stronger results than those previously available, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. Furthermore, we discuss the implications of two counterexamples with regards to the Significance of online updating and linearly parameterized function approximators. 1 INTRODUCTION The problem of predicting the expected long-term future cost (or reward) of a stochastic dynamic system manifests itself in both time-series prediction and control.

algorithm, function approximator, markov chain, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Belmont (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.59)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.42)

Add feedback

Analysis of Temporal-Diffference Learning with Function Approximation

Tsitsiklis, John N., Roy, Benjamin Van

Neural Information Processing SystemsDec-31-1997

The algorithm weanalyze performs online updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a characterization of the limit of convergence, and a bound on the resulting approximation error. In addition to establishing new and stronger results than those previously available, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. Furthermore, we discuss the implications of two counterexamples with regards to the Significance of online updating and linearly parameterized function approximators. 1 INTRODUCTION The problem of predicting the expected long-term future cost (or reward) of a stochastic dynamic system manifests itself in both time-series prediction and control. Anexample in time-series prediction is that of estimating the net present value of a corporation, as a discounted sum of its future cash flows, based on the current state of its operations. In control, the ability to predict long-term future cost as a function of state enables the ranking of alternative states in order to guide decision-making. Indeed, such predictions constitute the cost-to-go function that is central to dynamic programming and optimal control (Bertsekas, 1995). Temporal-difference learning, originally proposed by Sutton (1988), is a method for approximating long-term future cost as a function of current state.

Add feedback

Filters

Collaborating Authors

tsitsiklis and van roy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

e3bc4e7f243ebc05d66a0568a3331966-AuthorFeedback.pdf

The Fixed Points of Off-Policy TD

Optimal Stopping via Randomized Neural Networks

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence

Analysis of Temporal-Diffference Learning with Function Approximation

Analysis of Temporal-Diffference Learning with Function Approximation

Analysis of Temporal-Diffference Learning with Function Approximation